Chronogram or phylogram for ancestral state estimation? Model?fit statistics indicate the branch lengths underlying a binary character's evolution
نویسندگان
چکیده
Ancestral state estimation (ASE), or ancestral reconstruction, is the process of estimating evolutionary history a character on phylogeny (Cunningham et al., 1998; Donoghue, 1989; Swofford & Maddison, 1987). This has been used to reveal evolution key innovations across tree life, such as orb web in spiders (Kallal 2020) and reproductive mode squamates (Pyron Burbrink, 2014), explore early major clades flowering plants (Sauquet 2017) eukaryotes (Skejo 2021). In fact, much what we know about dynamics morphology, ecology biogeography deep time-scales rely ASE algorithms. The earliest methods Maximum Parsimony, reconstructing states by minimizing changes over without taking branch lengths into consideration (Maddison, 1991; Newer based Likelihood Bayesian Inference use models that incorporate lengths, can account for rate heterogeneity, speciation extinction rates, phylogenetic uncertainty (Beaulieu 2013; Huelsenbeck Bollback, 2001; Maddison 2007; Pagel, 1999b; Pagel 2004). A remaining issue received little attention how choose between alternate branch-length sets when conducting using model-based methods. For instance, whether phylogram—a which represent amount change, chronogram—a time (Cascini 2019; Cusimano Renner, 2014; Litsios Salamin, 2012). Chronograms are usually ASE, perhaps because an priori expectation probability change given (e.g. morphological trait) would depend elapsed. However, studies have shown rates also strongly correlate with molecular (Seligmann, 2010), examples now exist characters which, via comparison secondary evidence, performed phylogram more accurate Cascini 2019). Choice therefore, remains broadly relevant problem. Only handful explored this issue. Salamin (2012) first demonstrated sensitivity ASEs length choice simulations. They simulated paired chronograms phylograms, evolved continuous one them, looked at accuracy conducted each set. found were indeed set underlying character's evolution. positive correlation signal, estimated Blomberg's K (Blomberg 2003) Pagel's ? (Pagel, 1999a), reconstruction accuracy, therefore proposed researchers should returns highest signal ASE. Renner (2014) then effect discrete characters, plant chromosome number datasets. could influence results but did not find evidence statistic (which be applied although controversial; see Harmon, 2018) had any utility choosing alternative characters. Although their findings few datasets, they highlighted need further investigation study, address simulation study (a) discrete, binary (b) assess detail several potential test statistics identifying We three statistics: ?, Fritz' D (Fritz Purvis, 2010) Borges' ? (Borges 2019); latter two investigated previously context. alternative: model-fit These already comparative phylogenetics compare macroevolutionary incorporating transformations case these incur penalty added parameter during model-fitting. cannot incurred comparing sets, hypothesized may none-the-less indicate closely correlated evolution, result higher values. AICc (Akaike, 1974; Hurvich Tsai, 1989) BIC (Schwarz, 1978). Because show correct investigate relationship error, identify tree- character-based properties affect character. R statistical language (R Core Team, 2019), RStudio (RStudio made extensive tidyverse (Wickham packages ape (Paradis Schliep, Paradis 2004), castor (Louca Doebeli, 2018), geiger (Harmon 2008), FossilSim (Barido-Sottani phangorn (Schliep, 2011) phytools (Revell, gridExtra (Auguie, construct figures. Additional listed beginning section. packages: TreeSimGM (Hagen Stadler, 2018). generated 5,000 ultrametric trees, representing ‘chronograms’ replicate. Speciation modelled age-dependent processes Weibull distribution shape 0.4 produce trees close empirical datasets 2015, Tree size was randomly chosen from uniform 10 1,000 taxa. After generation, all rescaled random depth 1 100 remove association age resulted larger ‘evolutionary time’ generate, so subsequently (see below) range relative phylogeny. It noted while fossil taxa generally improve estimates (Puttick, 2016; Slater 2012), our methodology relate exclusively extant species. To generate corresponding chronogram (Figure 1). Transformation involved multiplying its descendant branches value drawn normal (? = 1, ? 0.2), mimicking either acceleration (if >1) deceleration <1). tree, particular affected only own leading it (i.e. previous history) transformation mimicked phylogenetically autocorrelated (Tao 2019) 2 original (but descendent branches) truncated 0.4, values ?0), having unique replicate phylogram, random, ‘correct’ lengths). different model, properties. Characters ‘Markov’ standard continuous-time Markov transition 0.05 1. estimate below), represented ‘best scenario’ minimal model misspecification. ‘Hidden Rates’ ‘hidden models’ 2013). models, observable ‘hidden’ categories. Transition categories hidden some misspecification present set, allowing us ability ‘Amplified Hidden multiplied 100, ‘slow’ ‘fast’ differed up orders magnitude. even greater levels subsequent step. ensured both least 5% tips, tips phylogenies <40 caper (Orme expm (Goulet 2017). produced marginal estimations (ASEs) 1999b). On models: equal transitions (equal rates), direction (different rates). ASEs, four per character: sets. Next, Fritz's Estimation required optimized inferences model. consistency, simulation) statistics. randomForest (Liaw Wiener, 2002). All analyses separately Markov, Rates Amplified unless otherwise stated. assessed done (the lengths) accurate. replicate, identified rates) lowest phylogram. these, calculated average node error resulting summing probabilities incorrect nodes dividing total nodes. questions involving often hinge large nodes, compared specifically five whose changed most calculating swing error. tested lower lengths. While entire obtained level individual only. First, extracted replicates, under same conditions simulate data different-rates through subtraction known true) states, obtain proxy scenario equiprobable, precision. Then, difference (chronogram phylogram) choice. Potential determinants response variables (node-based precision sensitivity) forests, relatively robust multicollinearity, lack independence deviations normality. all, predictors node, including metrics depth, descendants, surrounding branches, degree brevity, defined Table S1. Regression forests variable built sample 1% observations nodes). importance predictor decrease residual sum squares introduced splitting increase purity), averaged trees. adephylo (Jombart mgsub (Ewing, 2020), PerformanceAnalytics (Peterson Carl, psych (Revelle, reshape2 (Wickham, 2007). D, ?, led (corresponding ?). BIC, equal-rates models) statistics, correctly phylograms biases associated showed selecting returned almost identical Section 3), exploratory behaviour generally. reductions score corresponded only, (denoted X1 X7, 1) regression models. expected and/or optimization (X1–6), assumption easier effectively optimized. final (X7) measured overall determine different. Several (X2–4) chronograms, replicates—although exact differ slightly, pattern same. among removed reduce multicollinearity. stepwise multiple logistic analysis. (Paired Wilcoxon: p < 2.2e-16). Average 66%–76% replicates 63%–71% 2). average, decreased 0.4%–0.8%, 9%–14% Results similar independent assumed uncorrelated Random forest able explain substantial fraction variability (33.3% 25.0%, respectively; Figure 3). Both highly positively (Spearman's rank correlation: ? 0.84), showing strong support rare simulations 3.1% >0.75 state). accordance, patterns dependence important (including measures patristic distance root descending it; decreasing S1). other hand, experiencing high rare. 0.17% experienced than 0.5 performing Nonetheless, every topologies contained sensitive node. Node-based easy predict, explained 14.5% variance. still played role determining (with shallower being slightly choice; S1), minimum descendants reason why harder predict seem form clusters Figures S2–S4). consecutive flip favoured state, kind runaway effect, depending used. None incorporated information capture interdependency. Of significant (Table S2). 80% 4). dropped 63%, outperforming 4), >99% replicates. worst, 50% 60%. unusual returning proportion (almost 40% 20% set), precluding them. those 65% 56% Almost slight differences ‘biases’ detect phylograms; however, scale bias vastly S3). biased, significance dataset, (6% more) phylograms. Bias scale, consistent sets: 14%–25% 30%–45% (AICc BIC) 2%–4% 2% analysis investigating 2.2e-16, 0.29Markov, 0.28Hidden Rates, 0.34Amp. Rates), meaning improvements score) S5). X3: X4: Branch homogeneity, X5: Evolutionary asymmetry X6: Proportion S6). Due correlations, X4 X5 favour X3 X6, avoid excessive interpret. Other correlations considered sufficiently warrant removal procedure X1: size, X2: length, X7: Difference best fit. main effects (p ? 0.00012; S4), X7 X2 negative 5; S4). variance inflation factor 2.51 indicating interaction/correlation exists others (X3 indicated analysis: S6); interaction tolerable. Overall, improves terminals (relative proportions) decreases discussion) 5). core finding reinforcing (Litsios (Cusimano 2014). is, slowly, gaining recognition Ramírez 2021), cases. direct elapsed sensible, there scenarios where link might weaken. lineages stability spanning hundreds millions years (Cavin Guinot, Herrera-Flores burst short timeframes (Hopkins Smith, 2015; Ronco Such heterogeneity weaken temporal dependency certain traits cause (or consequence of) physiological, developmental functional constraints Brougham Campione, 2020; Davies Savolainen, 2006; Smith stronger genetic phenotypic divergence 2010). light this, careful justification chosen, seems warranted if paramount At cases rare, quarter topologies. Among formed clusters, defining regions topology dependent assumptions (Figures common shallow depths (as deeper tend low insensitive made; importantly, characterized asymmetrical less shorter When employing node-to-tip distances lead slow-evolving mean observed them will ASEs; however reduced out. Empirical heterogenous (especially scales) likely require (although Reyes benefit objective criteria competing here. Our suggest like leads promising method Before (2012)), focused analysis, poorly, respectively. contrast, sets) (<5% cases). measure strength (Münkemüller always good indicator ‘realistic’, terms likelihood data. directly related incorporates generating (Posada Buckley, intuitive better realistic. negatively character, simple assume constant violated extent, especially employed life King Lee, 2015). phylogenies, complex Boyko Beaulieu, 2021) shift (Grundler Rabosky, allow heterogeneity. smaller preclude aware limitations caused namely (number taxa), length. data) optimization, AICc. due approach. simulating assigned randomly, independently meant longer experience branch, average. probably ‘saturation’ weakening content reducing consequently usefulness electing increase, fit chance decreases. reiterates provides, time, no here extend multi-state caveat extra parameters. weight indicates estimation. lines appropriate currently available. authors conceptualizing designing study; J.D.W. N.M.K. carried out analyses; manuscript writing revisions M.J.R. thank Thomas Guillerme anonymous reviewer many suggestions greatly improved manuscript. supported Postdoctoral Fellowship Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina Australian Biological Resources Study (ABRS) Taxonomy Research Grant (RG18-03); Yale University NSF project DEB-2036186; FONCyT grant PICT-2017-2689. Arachnology team Museo Argentino Ciencias Naturales, Guilherme H. F. Azevedo, providing feedback ideas conception study. Ivan L. Magalhães provided preliminary scripts. declare conflict interest. peer review article available https://publons.com/publon/10.1111/2041-210X.13872. Data scripts deposited Dryad Digital Repository https://doi.org/10.5061/dryad.z08kprrfk (Wilson 2022). Appendix S1 Supporting Information Please note: publisher responsible functionality supporting supplied authors. Any queries (other missing content) directed author article.
منابع مشابه
EREM: Parameter Estimation and Ancestral Reconstruction by Expectation-Maximization Algorithm for a Probabilistic Model of Genomic Binary Characters Evolution
Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary...
متن کاملA comparison of ancestral state reconstruction methods for quantitative characters.
Choosing an ancestral state reconstruction method among the alternatives available for quantitative characters may be puzzling. We present here a comparison of seven of them, namely the maximum likelihood, restricted maximum likelihood, generalized least squares under Brownian, Brownian-with-trend and Ornstein-Uhlenbeck models, phylogenetic independent contrasts and squared parsimony methods. A...
متن کاملA Branch - Current - Based State Estimation Method for Distribution Systems
A branch-current-based three-phase state estimation (SE) method is proposed for distribution systems. The method is tailored for distribution feeders with a few loops. The method is computationally more efficient than the conventional node-voltage-based SE methods. To further improve the computational efficiency it is shown that distribution systems can be reduced without much loss of accuracy ...
متن کاملBranch lengths and support.
Although technical definitions exist for various support metrics, the notion of support per se has received little explicit attention. Thus, despite its widespread use in phylogenetics, “support” is absent from the glossaries and/or indices of several recent texts (e.g., Kitching et al., 1998; Page and Holmes, 1998; Schuh, 2001). Farris et al. (2001) recently argued that interpreting branch len...
متن کاملA Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences
Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-he...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Methods in Ecology and Evolution
سال: 2022
ISSN: ['2041-210X']
DOI: https://doi.org/10.1111/2041-210x.13872